Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata rework #33

Merged
merged 16 commits into from
Aug 22, 2024
Merged

Metadata rework #33

merged 16 commits into from
Aug 22, 2024

Conversation

joshmoore
Copy link
Member

@joshmoore joshmoore commented Aug 21, 2024

Proposed changes from the 20240821 morning challenge chat:

  • Move current code to a resave subcommand to allow future commands (see below)
  • Make licenses "STRONGLY RECOMMENDED" (via warnings, etc.)
  • Clearly identify the other fields as recommended (SHOULD) or optional (MAY)
  • Remove default values everywhere

Follow-up work will include:

  • Subcommand for looking up organism and modality arguments (e.g., NCBI and FBbi terms)
  • Subcommand for validating datasets, especially that license is set
  • Subbcomand for updating metadata in datasets
  • Possibly add a free-text option for organism and modality

README.md Outdated Show resolved Hide resolved
src/ome2024_ngff_challenge/resave.py Outdated Show resolved Hide resolved
src/ome2024_ngff_challenge/resave.py Outdated Show resolved Hide resolved
src/ome2024_ngff_challenge/resave.py Outdated Show resolved Hide resolved
Modality: confocal microscopy {cmd} in.zarr out.zarr --cc0 --rocrate-modality=obo:FBbi_00000251
Modality: two-photon laser scanning {cmd} in.zarr out.zarr --cc0 --rocrate-modality=obo:FBbi_00000253
Modality: two-photon laser scanning {cmd} in.zarr out.zarr --cc0 --rocrate-modality=obo:FBbi_00000253
Modality: scanning electrom microscopy {cmd} in.zarr out.zarr --cc0 --rocrate-modality=obo:FBbi_00000257
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: electrom
Do we want to validate modality at-all? I know that actually validating for real (look-up) is a pain, but do we want to avoid: --rocrate-modality=light_sheet and check that the term e.g. starts with obo:FBbi_?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷🏽 cc: @AybukeKY @sherwoodf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter is easiest and can definitely be the starting point. I know I suggested the look-up but I guess sticking to the fbbi may also help us somewhat avoid/ignore the wider problem that not everything (e.g. new tech) is on fbbi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well verifying that it starts with obo:FBbi_ won't solve fbbi not being up to date, but since it's a taxonomy i'd expect there to be some generic type that's relevant, even if it's not the most specific one a user would like to have.

I think validating that it starts with obo:FBbi or i guess http://purl.obolibrary.org/obo/FBbi_ would probably cover most cases/push people enough to actually go look up values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sidenote: have this working locally

ome2024-ngff-challenge lookup light-sheet
ONTOLOGY  	TERM                	LABEL                         	DESCRIPTION
fbbi      	FBbi_00000364       	light-sheet illumination      	illumination in the form of a thin sheet of light directed perpendicul
fma       	FMA_50608           	Sheet
snomed    	SNOMED_257402009    	Polytetrafluoroethylene sheet
envo      	ENVO_01000507       	iron sheet                    	An iron sheet is a mass of iron which has been forged into a roughly p
envo      	ENVO_00000132       	ice sheet                     	A glacier which covers an area of greater than 50,000 square kilometer
go        	GO_0098646          	collagen sheet                	A protein complex that consists of collagen triple helices associated
foodon    	FOODON_03301340     	rice sheet                    	SIREN DB annotation:
ncit      	NCIT_C13802         	Beta Sheet                    	In a b-sheet two or more polypeptide chains run alongside each other a
uberon    	UBERON_0010136      	epithelial sheet              	An epithelial sheet is a flat surface consisting of closely packed epi
snomed    	SNOMED_467440008    	Cranioplasty sheet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #34 to capture this so I can move forward with this PR.

@joshmoore
Copy link
Member Author

Thanks, @will-moore. Fixes pushed for most of the above. Another question I have is whether to move from --rocrate- to --metadata- everywhere.

@will-moore
Copy link
Member

@joshmoore I don't mind rocrate or metadata, but I guess if the args are --rocrate- then maybe we should mention why "rocrate".
Actually, the README could do with a short "Metadata" section that simply says "Metadata is written as Ro-Crate to ro-crate-metadata.json", with an example and say "See ome2024-ngff-challenge resave -h for more info (assuming we don't want to duplicate the help in the README).

E.g.

ome2024-ngff-challenge resave in.zarr out.zarr --cc-by --rocrate-organism=NCBI:txid10090 --rocrate-modality=obo:FBbi_00000251

@will-moore
Copy link
Member

idr0010 plate exported with commit c122913 above:
Strangely missing license from:

{
      "@id": "./",
      "@type": "Dataset",
      "resultOf": {
        "@id": "#9c74acf1-aa49-434a-9635-26ef724d9772"
      }
    },

https://deploy-preview-36--ome-ngff-validator.netlify.app/?source=https://uk1s3.embassy.ebi.ac.uk/idr/share/ome2024-ngff-challenge/0.0.5/idr0010/76-45.zarr

converted with:

time ome2024-ngff-challenge resave --input-bucket=bia-integrator-data --input-endpoint=https://uk1s3.embassy.ebi.ac.uk --input-anon S-BIAD885/0046b0d0-f20b-4482-84b1-4b2b154865fd/0046b0d0-f20b-4482-84b1-4b2b154865fd.zarr /data/will/idr0010/76-45.zarr --log debug --rocrate-modality=obo:FBbi_00000246 --rocrate-organism=NCBI:txid9606 --cc-by

But I can't reproduce that - working fine with other images and plates... 👍

@joshmoore joshmoore mentioned this pull request Aug 22, 2024
@joshmoore joshmoore merged commit 766aaca into ome:main Aug 22, 2024
5 checks passed
@joshmoore joshmoore deleted the metadata-rework branch August 22, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants